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ITEM VALIDITY BY THE ANALYSIS OF VARIANCE: 
AN OUTLINE OF METHOD 


By KennetH H. Baker 
University of Minnesota 


This paper is intended as an aid to the beginner who finds that 
the sufficient statistic for his problem is an analysis of the variance 
of his data. Although Snedecor has supplied a very useful hand- 
book", most of the problems which he works out are agricultural 
in detail. The present writer has found that many students of psy- 
chology and educational psychology have some trouble in translat- 
ing these methods which have been worked out with agricultural 
problems into concepts and procedures which apply to their own 
research situations. Lev’ has shown how multiple choice items for 
use in a scale may be validated and weighted by an analysis of 
variance, but he does not list the steps in detail. Since Lev’s prob- 
lem is rather more complex than many beginners will feel com- 
petent to handle, the writer feels that a simple problem might help 
more to introduce the student to methods of analysis which are 
proving to be of greater and greater value in the treatment and in- 
terpretation of the results of psychological research. 

An analysis of variance is most readily performed when the 
data are in the form of Table I. Variations of this table may be 
used for both the single and double criterion of classification and 
for an equal or unequal number of entries in each of the classes. 
The entries in the cells of this table may be mean scores, sums of 
raw scores, proportions or enumeration data. “Classes” refers to 
one criterion of classification and “Groups” to another. For ex- 
ample, “Classes” might refer to teaching methods and “Groups” to 
sections of students. For more complete discussions of the type of 

' Snedecor, G. W., Calculation and Interpretation of Analysis of 
Variance and Covariance. Ames, Iowa: Collegiate Press, 1934, Pp. 96. 


* Lev, J., Evaluation of test items by the method of analysis of variance, 
Jour. Educ. Psychol., 1938, 29, 623-630. 
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TABLE I 
Classes 
Groups 1 2 3 4 5 
x, x, x, 
Xx X X 


data to which this method is applicable, the reader is referred to 
Fisher’, Snedecor’, Goulden’, and others. 

In the present problem, “X” is the proportion of individuals 
passing a certain item in an examination. “Classes” refers to sub- 
divisions of the criterion which was the distribution of the sum of 
the scores on four similar examinations. Grade in the course might 
have been used as the criterion, but it was rejected since it was a 
composite evaluation of several kinds of performance. Total score 
in examinations represents a criterion more closely related to the 
validity of items composing the examinations. The distribution of 
total scores was arbitrarily divided into five criterion classes. Class 
1 contained the upper 10% of the distribution; Class 2 the next 
15%; Class 3 the next (middle) 50%; Class 4 the next 15%; and 
Class 5 the lowest 10%. It may be noted that these divisions corre- 
spond roughly to the divisions used when letter-grades of “A,” “B,” 
etc. are assigned. The proportion of students passing a given item 
in each of these classes could be calculated, and a valid item would 
appear as one in which a decreasing proportion of passes occurred 
in each of the criterion classes. The standard error of the propor- 


3 Fisher, R. A., Statistical Methods for Research Workers. London: 
Oliver and Boyd, 1936, Pp. 339, Chapter VII. also The Design of Ex- 
periments. London: Oliver and Boyd, 1937, Pp. 258, Chapters III and IV. 

* Snedecor, G. W., Statistical Methods. Ames, Iowa: Collegiate Press, 
1937, Pp. 341, Chapters X and XI. 

5 Goulden, C. H., Methods of Statistical Analysis. New York: John 
Wiley and Sons, 1939, Pp. 277, Chapter XII. 


3 xX xX xX xX 
31 32 33 34 35 iia 

4 xX xX xX xX xX 
41 42 43 ‘44 45 2 
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tions calculated in this way would be a function entirely of the size 
of the proportion and the number of cases in each class. 

In a problem of this kind, the size of the proportion should 
have nothing to do with determining the extent of the error of ob- 
servation. A proportion, for instance, of 90% passing is no more 
or less significant as such than another proportion of, let us say, 
20%. The illusion of the greater reliability of the larger propor- 
tion develops from the statistics of inverse probability. These statis- 
tics are not strictly applicable to the present problem. A better way 
to determine the error of the observed proportion is to make sev- 
eral estimates of the proportion and calculate the error of these 
estimates in terms of their variance. 

In the present problem, the five criterion classes are subdivided 
into four groups each. There are two restrictions to be observed 
in the subdividing: (1) It must be strictly a randomizing process, 
and (2) the subdivisions should not contain fewer than ten cases. 
The first condition can be met easily by using dice or some other 
equally effective method. When there are fewer than 40 cases in a 
given criterion class, the second condition can be met by using fewer 
than four groups for that class. When this is done, a slight correc- 
tion must be introduced into the procedure to be outlined below. 
The net result of this method is to give four (instead of one) esti- 
mates of the population parameters. The error of observation for 
the four proportions taken together will, of course, be less than 
when only one proportion is calculated. 

In passing, it should be noted that some problems will require 
that the scores of those successfully answering an item should be 
used rather than a simple enumeration of the number passing. The 
X’s in Table I would then refer to a mean score of those passing 
an item, whereas in the present problem they refer to a proportion 
of passes. For the present purposes, proportion of passes is ade- 
quate. If, however, it is desirable to weight an item (in a scale, for 
instance) , mean score should be used. 

To calculate and compare variances, the following steps are 
recommended. When several items are to be validated, the data of 
Table I are put on ordinary 3” x5” or 4” x6" cards. Table Ia 
represents such an arrangement, the numbers preceded by the letter 
“C” representing the various steps in calculation. 
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TABLE Ia 


X 
xx ce cH 
& 


<7 CP C19 
C2 C4 C6 C8& C10 C 20 
C22 C23 C24C25 C26 C21 


Xx 


Ci 4% +X +X 4+xX 
ll 21 31 41 il 12 13 4 fs) 
C2 =X? +X? +X2 +X2 C12 =X? + +X? +X? 
ll 21 31 41 ll 12 13 14 15 
C3 wX.. Ci3=X +X +X +X 4+X_ 
12 22 32 42 21 22 23 24 2 
C4 =X? +X2 +X2 4X2 X? +X? 4+ X? 
12 22 32 42 21 22 23 24 25 
cs =X +X +X_+xX Cis=X +X +X +X +X 
13 23 33 43 31 32 33 34 35 
C6 =X? +X? +X2 +X2 C16=X? +X? +X? +X? 
13 23 33 43 3 | 32 33 3d 35 
c7 =X +X +X Ci7=X +X +X +X 
14 24 34 44 41 42 43 44 45 
C8 =X? +X? +X? + X? C18 = +X? +X? +X? + xX? 
14 24 34 44 dl 42 43 44 45 
C9 =X +X Ci9=C1 +C3 +C5 +C7 +C9 


C10 =X? 4X2 =C114+C134+C15+C17 (check) 


C20 =C 192 

+ 4 
+ 
Ce =C7 +.4 


* On this page as on following pages the letter C, followed by a figure, repre- 
sents a step in calculation. 


The variances may be summarized in the following table on the 
reverse side of each card. 


‘ane 
oe 

Moe 
31 32 
= 
42 

| 
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TABLE II 

Source of Sum of Degrees of | Mean square| “F” 
variation squares freedom (variance) 
Total C27 19 
Between 
classes C 28 4 C 32 C 36 
Between 
groups C29 3 C 33 
Error C 30 12 C 34 C 37 
Error C31 15 C 35 C 38 
Fiducial limits C 39 


C27=C2 +C4 +C6 +C8 +C10—C21 
=C124+C144C16+C 18—C21 (check) 

2 2 2 2 2 
+C3 +C7? +C9 —c21 


2 2 2 2 
1127+C 13 +e 152+C 17 


C 30=C 27 —C 28 —C 29 
C31=C30 + C29 


C32=C 28+ 4 
5 
C34=C 30 + 12 
CH=CH +5 


C 36=C 32 + C33 (or vice versa, if C 33 is larger) 
C37=C 34 + C33 (or vice versa, if C 33 is larger) 
C38=C 32 + C35 (or vice versa, if C 35 is larger) 


Notes on calculations: The operations outlined here are ar- 
ranged for machine calculation. Calculations C1 through C 18 
may be done in pairs, i.e., C 1 with C 2, C 3 with C 4, etc. 

The check for calculation C 19 tells only of the correctness of 
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the total for the rows and columns. The check for the sums of 
squares is not made until calculation C 27, although this is seldom 
in error, if the calculator is operating correctly. 


Calculation C 21 gives the correction factor. The “20” in this 
operation refers, of course, to the number of cells in the original 
table. It does not refer to the number of cases or operation C 20. 

Calculations C 22 through C 26 are for the means of the col- 
umns. In a complete analysis of a double criterion of classification, 
means of the rows would also be calculated. The latter means are 
obviously of no value in the present problem. 

Before the division is performed in calculations C 28 and C 29, 
the item-count dial of the calculator should show a figure equal to 
C 19. 

Calculations C 30 and C 31 are obtained by subtraction and will 
be incorrect if previous calculations are inaccurate. It would seem 
that the checks introduced so far are adequate to minimize the 
possibility of error. 

Calculation C 36 gives an “F” which tells whether the variance 
between the criterion classes is significantly greater than the vari- 
ance within these classes. 

Calculation C 37 gives an “F” which tells whether the variance 
attributable to the randomization process is significantly different 
from the variance from other sources. This is a rough check on the 
adequacy of the size of the sample upon which each of the propor- 
tions is based. In a strict analysis of a double criterion of classi- 
fication, this “F” tells whether the group means come from a homo- 
geneous population of means. 

Calculation C 38 gives the most important “F” for an item 
validity study since the “Error” here includes the variance attrib- 
utable to all sources other than the criterion classification. 

Calculation C 39 gives the amount by which the column means 
must vary in order to satisfy the 1% level of significance for the 
appropriate (in this case, six) degrees of freedom. The calculation 
is based on the assumption that the variances of each of the classes 
do not differ significantly, ie. that the variance of the variances 
is low. A rough check of the truth of this assumption may be 


made by applying a chi-square test: Chi-square == 


| 
be 
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where m is the theoretical variance of the total population the best 
4 


estimate of which is~—~ in which a’ is the square of the mean of 


k-1 
the variances within the classes, k the number in the groups. When 
the number in the groups is unequal, k can be estimated by taking 
the harmonic mean. This test of the homogeneity of the variances 
is usually not necessary, although it should be made, if a thorough 
analysis is to be carried out. 

In conclusion, it must be pointed out that, although this method 
of analysis milks the data of every bit of information they con- 
tain, it is impractical from the standpoint of time unless the items 
to be validated are to be used in some sort of scale. To use this 
method in the validation of items in an ordinary class room exami- 
nation requires more time than the results of the analysis would 
justify. On the other hand, for the study of items in questionnaires, 
scales and other research instruments, the analysis of variance is 
most effective. 


Boe 
| 
| 


